Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: solve potential deadlock for bsc p2p protocol handshake #1479

Conversation

NathanBSC
Copy link
Contributor

Description

fix potential deadlock by bsc p2p protocol handshake

Rationale

Process description:

  1. runEthPeer wait bsc extension in waitBscExtension
    timeout is 10 seconds
  2. bscHandler.RunPeer, then do handshake protocol, has a smaller timeout
  3. timeout (10 seconds) is reached,
    but ps.lock is got by other go routines, such as bscHandler.RunPeer and registerBscExtension
    they are blocked in such code now
    wait <- peer
    then deadlock happens

Root cause :
func RunPeer is designed as not cost much time before.
but when handshake process added in, handshake also has a timeout,
so ethPeer have to wait more time than before, more close to timeout (10 seconds),
thus leading to improve chance of potential deadlock greatly.

Solve out:
we can move handshake of sub protocol such as bsc and diff
to behind handshake of main protocol

Example

add an example CLI or API response...

Changes

Notable changes:

  • add each change in a bullet point here
  • ...

@NathanBSC NathanBSC force-pushed the fix-bsc-p2p-protocal-handshake-potential-deadlock branch from d9f401e to 03f11f7 Compare April 18, 2023 04:18
@NathanBSC NathanBSC changed the title fix potential deadlock for bsc p2p protocol handshake fix: solve potential deadlock for bsc p2p protocol handshake Apr 18, 2023
@@ -458,7 +458,12 @@ func (ps *peerSet) registerPeer(peer *eth.Peer, ext *snap.Peer, diffExt *diff.Pe
eth.trustExt = &trustPeer{trustExt}
}
if bscExt != nil {
eth.bscExt = &bscPeer{bscExt}
if err := bscExt.Handshake(); err == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this seem against the design of geth, as if every protocol follow the same pattern, the handshake of the different protocol will happen serially.

Copy link
Contributor Author

@NathanBSC NathanBSC Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the design of geth has no handshake exclude eth protocol,
waitBscExtension and registerBscExtension can end quickly
when handshake of sub protocol involved in above two , it' begin to slow down and lead to problem

@@ -74,6 +86,11 @@ func MakeProtocols(backend Backend, dnsdisc enode.Iterator) []p2p.Protocol {
// Handle is the callback invoked to manage the life cycle of a `bsc` peer.
// When this function terminates, the peer is disconnected.
func Handle(backend Backend, peer *Peer) error {
select {
case <-peer.handshaked:
Copy link
Collaborator

@unclezoro unclezoro Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not extensible, as the channel is only used by bsc protocol, it wont work if another extension is needed in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better snap and bsc share the same pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snap has no handshake, it's easy.
all problem begin from a sub protocol need handshake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not extensible, as the channel is only used by bsc protocol, it wont work if another extension is needed in the future.

another sub protocol need handshake should have it's own handshaked channel

@brilliant-lx
Copy link
Collaborator

Try to illustrate how the deadlock happen:

  • routine of the BSC protocol
    image

  • routine of the Eth protocol
    image

@NathanBSC
Copy link
Contributor Author

NathanBSC commented Apr 19, 2023

I will shutdown this PR
another PR with less change will solve this problem.
#1484

@NathanBSC NathanBSC closed this Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants